A Word-to-Word Model of Translational Equivalence
ثبت نشده
چکیده
Many multilingual NLP applications need to translate words between different languages, but cannot afford the computational expense of inducing or applying a full translation model. For these applications, we have designed a fast algorithm for estimating a partial translation model, which accounts for translational equivalence only at the word level . The model's precision/recall trade-off can be directly controlled via one threshold parameter. This feature makes the model more suitable for applications that are not fully statistical. The model's hidden parameters can be easily conditioned on information extrinsic to the model, providing an easy way to integrate pre-existing knowledge such as partof-speech, dictionaries, word order, etc.. Our model can link word tokens in parallel texts as well as other translation models in the literature. Unlike other translation models, it can automatically produce dictionary-sized translation lexicons, and it can do so with over 99% accuracy. 1 I n t r o d u c t i o n Over the past decade, researchers at IBM have developed a series of increasingly sophisticated statistical models for machine translation (Brown et al., 1988; Brown et al., 1990; Brown et al., 1993a). However, the IBM models, which attempt to capture a broad range of translation phenomena, are computationally expensive to apply. Table look-up using an explicit translation lexicon is sufficient and preferable for many multilingual NLP applications, including "crummy" MT on the World Wide Web (Church & I-Iovy, 1993), certain machine-assisted translation tools (e.g. (Macklovitch, 1994; Melamed, 1996b)), concordancing for bilingual lexicography (Catizone et al., 1993; Gale & Church, 1991), computerassisted language learning, corpus linguistics (Melby. 1981), and cross-lingual information retrieval (Oard &Dorr , 1996). In this paper, we present a fast method for inducing accurate translation lexicons. The method assumes that words are translated one-to-one. This assumption reduces the explanatory power of our model in comparison to the IBM models, but, as shown in Section 3.1, it helps us to avoid what we call indirect associations, a major source of errors in other models. Section 3.1 also shows how the oneto-one assumption enables us to use a new greedy competitive linking algorithm for re-estimating the model's parameters, instead of more expensive algorithms that consider a much larger set of word correspondence possibilities. The model uses two hidden parameters to estimate the confidence of its own predictions. The confidence estimates enable direct control of the balance between the model's precision and recall via a simple threshold. The hidden parameters can be conditioned on prior knowledge about the bitext to improve the model's accuracy. 2 C o o c c u r r e n c e With the exception of (Fung, 1998b), previous methods for automatically constructing statistical translation models begin by looking at word cooccurrence frequencies in bitexts (Gale & Church, 1991; Kumano & Hirakawa, 1994; Fung, 1998a; Melamed, 1995). A b i t ex t comprises a pair of texts in two languages, where each text is a translation of the other. Word co-occurrence can be defined in various ways. The most common way is to divide each half of the bitext into an equal number of segments and to align the segments so that each pair of segments Si and Ti are translations of each other (Gale & Church, 1991; Melamed, 1996a). Then, two word tokens (u, v) are said to c o o c c u r in the
منابع مشابه
Equivalence in Technical Texts: The Case of Accounting Terms in English-Persian Dictionaries
Translating accounting documents, in general, and accounting terminology, in particular, is not a simple task, especially when the new terms keep created in pace with accounting developments. This study was carried out to find the most common and preferable ways to translate accounting terms from English into Persian. Also, an attempt was made to identify the frequently used patterns of word-fo...
متن کاملEquivalence in Technical Texts: The Case of Accounting Terms in English-Persian Dictionaries
Translating accounting documents, in general, and accounting terminology, in particular, is not a simple task, especially when the new terms keep created in pace with accounting developments. This study was carried out to find the most common and preferable ways to translate accounting terms from English into Persian. Also, an attempt was made to identify the frequently used patterns of word-fo...
متن کاملA Word-to-Word Model of Translational Equivalence
Many multilingual NLP applications need to translate words between different languages, but cannot afford the computational expense of inducing or applying a full translation model. For these applications, we have designed a fast algorithm for estimating a partial translation model, which accounts for translational equivalence only at the word level. The model’s precision/recall trade-off can b...
متن کاملStatistical Alignment Models for Translational Equivalence
The ever-increasing amount of parallel data opens a rich resource to multilingual natural language processing, enabling models to work on various translational aspects like detailed human annotations, syntax and semantics. With efficient statistical models, many cross-language applications have seen significant progresses in recent years, such as statistical machine translation, speech-to-speec...
متن کاملیک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملA Model of E-Loyalty and Word-Of-Mouth based on e-trust in E-banking services (Case Study: Mellat Bank)
Customers extend robust trust to a business when they believe the business puts their interests first. Good experience of banking services and recommendations of other customers can increase trust. Loyalty and Word of mouth (WOM) is accepted as key factors successes of marketing. This paper seeks to discover the affecting factors on positive word of mouth and loyalty based on trust enhancement ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002